SETI Test Set Classification Accuracy

This notebook provides the code needed to calculate the performance of your signal classification models using the PREVIEW test set (see Step 1. Get Data notebook)


In [1]:
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import numpy as np
import sklearn
import csv
import operator

In [2]:
class_list = ['brightpixel', 'narrowband', 'narrowbanddrd', 'noise', 'squarepulsednarrowband', 'squiggle', 'squigglesquarepulsednarrowband']

In [3]:
fieldnames = ['uuid'] + class_list

In [4]:
#Helper functions for parsing the data and using sklearn to print scoring metrics

def classChooser(listOfDictionaryScores):
    results = []
    for row in listOfDictionaryScores:
        rowscores = dict((k, float(row[k])) for k in class_list)
        maxclass = max(rowscores.iteritems(), key=operator.itemgetter(1))[0]
        results.append({'UUID':row['uuid'], 'SIGNAL_CLASSIFICATION':maxclass})
        
    return results

def printsklearnScores(y_true, y_pred, y_prob):
    
    print sklearn.metrics.classification_report(y_true,y_pred, digits=5)
    print sklearn.metrics.confusion_matrix(y_true,y_pred)
    print("Classification accuracy: %0.6f" % sklearn.metrics.accuracy_score(y_true,y_pred) )
    print("Log Loss: %0.6f" % sklearn.metrics.log_loss(y_true,y_prob) )
    
    
# Takes a .csv scorecard file, compares the results to the preview testset UUID,Class file and
# prints the scores. 
def score(resultsFile):

    testSetFile = 'private_list_primary_v3_testset_preview_uuid_class_29june_2017.csv'
    
    actual_uuid = csv.DictReader(open(testSetFile))
    actual_uuid_list = [x for x in actual_uuid]
    actual_uuid_list_sorted = sorted(actual_uuid_list, key=lambda k: k['UUID']) 

    classifier_results = csv.DictReader(open(resultsFile), fieldnames=fieldnames)
    classifier_results_list = [x for x in classifier_results]
    classifier_results_list_sorted = sorted(classifier_results_list, key=lambda k: k['uuid']) 

    #yc = classChooser(classifier_results_list_sorted)
    #print yc[:5]
    
    y_true = [x['SIGNAL_CLASSIFICATION'] for x in actual_uuid_list_sorted]
    y_pred = [x['SIGNAL_CLASSIFICATION'] for x in classChooser(classifier_results_list_sorted)]
    y_prob = [[float(row[cl]) for cl in class_list] for row in classifier_results_list_sorted] 
    
    printsklearnScores(y_true, y_pred, y_prob)

Scoring a Scorecard

The Preview test data set can be obtained in the Step 1. Get Data notebook. Using your trained model, you can generate a scorecard for this preview test data set. Your scorecard must be a CSV file with 8 columns. The first column value will contain the UUID and the next 7 will contain the probability estimates for each of the classes that were produced by your model. See the Judging Information notebook for more information.

Now you can score the scorecard using this code. [We now are providing the Preview test data set key in order for you to easily produce your own confusion matrix and scoring. This will give you the exact answers for the preview test set, of course.]


Using the Example Scorecard.

On the Judging Information notebook there is a link to download an example scorecard.


In [5]:
score('example_scorecard_codechallenge_v3_testset_preview.csv')


                                precision    recall  f1-score   support

                   brightpixel    0.12349   0.12812   0.12577       320
                    narrowband    0.16467   0.14474   0.15406       380
                 narrowbanddrd    0.19274   0.19885   0.19574       347
                         noise    0.12392   0.12798   0.12592       336
        squarepulsednarrowband    0.16298   0.17302   0.16785       341
                      squiggle    0.18106   0.17711   0.17906       367
squigglesquarepulsednarrowband    0.13043   0.13003   0.13023       323

                   avg / total    0.15525   0.15493   0.15495      2414

[[41 51 50 38 35 60 45]
 [47 55 61 65 48 56 48]
 [42 46 69 52 44 42 52]
 [47 47 56 43 53 41 49]
 [42 42 43 47 59 56 52]
 [60 42 45 64 57 65 34]
 [53 51 34 38 66 39 42]]
Classification accuracy: 0.154930
Log Loss: 2.230604


Using the Preview Test Set Key.

If I use a scorecard built from the preview test UUID,class CSV file, then I will get a perfect score. With the UUID,class file I created the private_list_primary_v3_testset_preview_scoreboard_key_29june_2017.csv. This scorecard will produce a perfect score.


In [6]:
#Test with the scoreboard key. This should get 100% accuracy
score('private_list_primary_v3_testset_preview_scoreboard_key_29june_2017.csv')


                                precision    recall  f1-score   support

                   brightpixel    1.00000   1.00000   1.00000       320
                    narrowband    1.00000   1.00000   1.00000       380
                 narrowbanddrd    1.00000   1.00000   1.00000       347
                         noise    1.00000   1.00000   1.00000       336
        squarepulsednarrowband    1.00000   1.00000   1.00000       341
                      squiggle    1.00000   1.00000   1.00000       367
squigglesquarepulsednarrowband    1.00000   1.00000   1.00000       323

                   avg / total    1.00000   1.00000   1.00000      2414

[[320   0   0   0   0   0   0]
 [  0 380   0   0   0   0   0]
 [  0   0 347   0   0   0   0]
 [  0   0   0 336   0   0   0]
 [  0   0   0   0 341   0   0]
 [  0   0   0   0   0 367   0]
 [  0   0   0   0   0   0 323]]
Classification accuracy: 1.000000
Log Loss: 0.000000

Winning Team's Scorecard.

I've included the winning team's scorecard submitted to the preview test set scoreboard on July 21 in this repository. The scores for that scorecard are shown below.


In [7]:
score("results_Effsubsee_best_preview_test_set.csv")


                                precision    recall  f1-score   support

                   brightpixel    0.99262   0.84062   0.91032       320
                    narrowband    0.98592   0.92105   0.95238       380
                 narrowbanddrd    0.94693   0.97695   0.96170       347
                         noise    0.79433   1.00000   0.88538       336
        squarepulsednarrowband    0.96923   0.92375   0.94595       341
                      squiggle    0.99178   0.98638   0.98907       367
squigglesquarepulsednarrowband    0.98738   0.96904   0.97813       323

                   avg / total    0.95326   0.94615   0.94693      2414

[[269   0   0  51   0   0   0]
 [  0 350  18   7   5   0   0]
 [  1   1 339   6   0   0   0]
 [  0   0   0 336   0   0   0]
 [  1   4   1  20 315   0   0]
 [  0   0   0   1   0 362   4]
 [  0   0   0   2   5   3 313]]
Classification accuracy: 0.946147
Log Loss: 0.188138

How to score your own test set

You can use the score functions above with your own test data set parsed from the training data set to measure your model performance. Of course, this let's you test different models and different model parameters more quickly and while keeping the preview test set available for your nearly completed model.

The following code will

  • show how to split the training data into a training set and test set
  • create some fake models to produce some predicted values for the test set
  • pass those predicted values to the printsklearnScores function above

1. Split Up the Data

First, let's split our data up into a training data set and a test set. We start with the primay small index file.


In [8]:
indexfile = 'public_list_primary_v3_small_21june_2017.csv'
indexfile_uuid = csv.DictReader(open(indexfile))
indexfile_uuid_list = [x for x in indexfile_uuid]
indexfile_uuid_list = sorted(indexfile_uuid_list, key=lambda k: k['UUID'])

X = [x['UUID'] for x in indexfile_uuid_list]
y = [class_list.index(x['SIGNAL_CLASSIFICATION']) for x in indexfile_uuid_list] #also convert from class name to a number

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)

2. Train Your Model

In normal operation, you'd then use the X_train set of UUIDs to grab the <UUID>.dat data files and produce spectrograms and features. You'd then pass your features, along with y_train, which contains the labels, to your model for training.

Below, I've coded up two FAKE models. The randomModel produces random probabilities. The perfectModel actually uses the known values in the y_test -- so it will produce a perfect score.


In [9]:
from sklearn.preprocessing import LabelBinarizer

# Example classes
# Your class, of course, would have actual code in the `train` functions and
# the predict function would also be different.

class randomModel(object):
    def __init__(self):
        pass
    
    def train(self, X_train, y_train):
        ## do whatever
        pass
    
    def predict(self, X_test):
        y_prob = np.random.rand(len(X_test), len(class_list))
        return (y_prob.T / y_prob.sum(axis=1)).T

    
class perfectModel(object):
    def __init__(self):
        pass
    
    def train(self, X_train, y_train):
        ## train
        pass
    
    def predict(self, X_test):
        encoder = LabelBinarizer()
        ytest_np = np.array(y_test).reshape(1,-1)
        ytest_onehot = encoder.fit_transform(ytest_np.T)
        return ytest_onehot

3. Make Predicitons and Score

Next, you'd take the X_test set of UUIDs, extract the necessary spectrogram and features and pass that to your model in order to predict their class. We use the two fake models from above: perfectModel and randomModel.

Each model.predict function returns a 2d array, M x K, where M is the number of samples in the test set passed into the function and K is the number of classes. The values for each row are the class probability predictions. Obviously, your model should produce a LogLoss and classification accuracy score somewhere between these two values.


In [10]:
mRandModel = randomModel()
mRandModel.train(X_train, y_train)
y_prob = mRandModel.predict(X_test)
y_true = [class_list[i] for i in y_test]
y_pred = [class_list[probarray.argmax()] for probarray in y_prob]

print 'The randomModel class produces random probability estimates'
print y_prob[:5]
print ''

printsklearnScores(y_true, y_pred, y_prob)


The randomModel class produces random probability estimates
[[ 0.20222191  0.15488604  0.04116172  0.05002166  0.1117629   0.2144418
   0.22550396]
 [ 0.03256574  0.22902783  0.22244701  0.15240424  0.20808028  0.04013136
   0.11534354]
 [ 0.26182883  0.10909254  0.09191382  0.02477715  0.11923854  0.24349684
   0.14965228]
 [ 0.06927687  0.13536934  0.2392518   0.24012245  0.0205762   0.12298141
   0.17242193]
 [ 0.1835864   0.20328221  0.02019067  0.05209727  0.1254207   0.20358192
   0.21184083]]

                                precision    recall  f1-score   support

                   brightpixel    0.17544   0.19417   0.18433       103
                    narrowband    0.17000   0.17000   0.17000       100
                 narrowbanddrd    0.12766   0.11765   0.12245       102
                         noise    0.14019   0.17241   0.15464        87
        squarepulsednarrowband    0.21429   0.17822   0.19459       101
                      squiggle    0.20721   0.22549   0.21596       102
squigglesquarepulsednarrowband    0.20000   0.17143   0.18462       105

                   avg / total    0.17724   0.17571   0.17571       700

[[20 12 11 14 12 15 19]
 [22 17 14 18 10 10  9]
 [14 15 12 16 15 19 11]
 [12 13 15 15  6 14 12]
 [13 15 15  9 18 16 15]
 [19 14 13 20  7 23  6]
 [14 14 14 15 16 14 18]]
Classification accuracy: 0.175714
Log Loss: 2.274388

In [11]:
mPerfectModel = perfectModel()
mPerfectModel.train(X_train, y_train)

y_prob = mPerfectModel.predict(X_test)
y_true = [class_list[i] for i in y_test]
y_pred = [class_list[probarray.argmax()] for probarray in y_prob]

print y_prob[:5]

printsklearnScores(y_true, y_pred, y_prob)


[[0 0 0 0 1 0 0]
 [0 0 0 0 0 0 1]
 [0 0 0 1 0 0 0]
 [0 0 0 0 1 0 0]
 [0 0 0 1 0 0 0]]
                                precision    recall  f1-score   support

                   brightpixel    1.00000   1.00000   1.00000       103
                    narrowband    1.00000   1.00000   1.00000       100
                 narrowbanddrd    1.00000   1.00000   1.00000       102
                         noise    1.00000   1.00000   1.00000        87
        squarepulsednarrowband    1.00000   1.00000   1.00000       101
                      squiggle    1.00000   1.00000   1.00000       102
squigglesquarepulsednarrowband    1.00000   1.00000   1.00000       105

                   avg / total    1.00000   1.00000   1.00000       700

[[103   0   0   0   0   0   0]
 [  0 100   0   0   0   0   0]
 [  0   0 102   0   0   0   0]
 [  0   0   0  87   0   0   0]
 [  0   0   0   0 101   0   0]
 [  0   0   0   0   0 102   0]
 [  0   0   0   0   0   0 105]]
Classification accuracy: 1.000000
Log Loss: 0.000000

In [ ]: